recent history
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models
Tu, Shangqing, Sun, Yuliang, Bai, Yushi, Yu, Jifan, Hou, Lei, Li, Juanzi
To mitigate the potential misuse of large language models (LLMs), recent research has developed watermarking algorithms, which restrict the generation process to leave an invisible trace for watermark detection. Due to the two-stage nature of the task, most studies evaluate the generation and detection separately, thereby presenting a challenge in unbiased, thorough, and applicable evaluations. In this paper, we introduce WaterBench, the first comprehensive benchmark for LLM watermarks, in which we design three crucial factors: (1) For \textbf{benchmarking procedure}, to ensure an apples-to-apples comparison, we first adjust each watermarking method's hyper-parameter to reach the same watermarking strength, then jointly evaluate their generation and detection performance. (2) For \textbf{task selection}, we diversify the input and output length to form a five-category taxonomy, covering $9$ tasks. (3) For \textbf{evaluation metric}, we adopt the GPT4-Judge for automatically evaluating the decline of instruction-following abilities after watermarking. We evaluate $4$ open-source watermarks on $2$ LLMs under $2$ watermarking strengths and observe the common struggles for current methods on maintaining the generation quality. The code and data are available at \url{https://github.com/THU-KEG/WaterBench}.
- Africa > Ghana (0.05)
- Oceania > Australia (0.04)
- North America > United States > Texas (0.04)
- (26 more...)
- Personal (0.92)
- Research Report > New Finding (0.46)
- Materials > Metals & Mining > Gold (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
AI Blog Post and Artwork
I began with a prompt in #ChatGPT "Can you write me a blog post about the recent history of coffee culture in China?" I then headed over to #DallE to create some images for the post. I used this prompt, "A watercolor of a coffee shop." I picked one I liked and had it generate some variations. Here is the blog post with images fully created by #AI.
GPT-3: The biggest breakthrough in AI in recent history
GPT-1 was released on June 11, 2018. When this model was released by OpenAI, there was much excitement. It was the transformer structure combined with unsupervised pre-training with promising results. The key difference between GPT-1 and the other language-based models before it, is that it was fine-tuned, or trained for specific tasks. GPT-2 was introduced in February 2019.
How Alexa Is Learning to Converse More Naturally : Alexa Blogs
To handle more-natural spoken interactions, Alexa must track references through several rounds of conversation. If, for instance, a customer says, "How far is it to Redmond?" and after the answer follows up by saying, "Find good Indian restaurants there", Alexa should be able to infer that "there" refers to Redmond. We call the task of reference tracking "context carryover," and it's a capability that is currently being phased in to the Alexa experience. At this year's Interspeech, the largest conference on spoken-language understanding, my colleagues and I will present a paper titled "Contextual Slot Carryover for Disparate Schemas," which describes our solution to the problem of slot carryover, a crucial aspect of context carryover. "Domain" describes the type of application -- or "skill" -- that the utterance should invoke; for instance, mapping skills should answer questions about geographic distance.
- Retail > Online (0.40)
- Consumer Products & Services > Restaurants (0.36)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.38)
U.N. hopes to build momentum for humanitarian aid in Syria if cease-fire holds
As a cease-fire in Syria entered its third day, authorities said Wednesday that the government was to begin withdrawing forces from a strategic road in the besieged rebel-held area of Aleppo, a move that would open the way for U.N. relief shipments. Syrian forces, with Russian air support, closed the road in July, thereby completing the encirclement of rebel districts with a population of as many as 300,000. A withdrawal south of Castello Road was to begin Thursday morning, according to Russian state news agency TASS. Rebel forces reportedly were to retreat simultaneously to positions north of the road. But it remained unclear whether residents of the city would accept the sacks of flour and food baskets the United Nations hopes to deliver.
- Asia > Russia (0.39)
- Asia > Middle East > Syria > Aleppo Governorate > Aleppo (0.28)
- North America > United States > North Carolina > Buncombe County > Asheville (0.06)
- (6 more...)
- Government > Military (1.00)
- Government > Regional Government > North America Government > United States Government (0.76)
- Government > Regional Government > Asia Government (0.71)